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Preface 


This  thesis  is  basically  an  extensive  survey  done 
in  the  area  of  Availability  as  an  important  measure  of  sys¬ 
tem  effectiveness.  Availability  appears  to  be  a  more 
appropriate  measure  than  reliability  for  measuring  the 
effectiveness  of  maintained  systems  because  it  includes 
reliability  as  well  as  maintainability. 
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Abstract 

■\ 

N / 

Availability  appears  to  be  a  more  appropriate 
measure  than  reliability  for  measuring  the  effectiveness 
of  maintained  systems  because  it  includes  reliability  as 
well  as  maintainability.  This  thesis  is  a  survey  and  a 
systematic  classification  of  the  literature  relevant  to 
availability.  Emphasis  in  this  thesis  is  centered  on  a 
variety  of  topics  related  to  availability.  The  topics 
discussed  are:  the  definition  and  concepts  of  the  avail¬ 
ability,  the  probability  density  functions  of  failure 
times  and  of  repair  times,  system  configurations;  and  the 
various  approaches  employed  to  obtain  the  availability 
models;  effect  of  preventive  maintenance  policies  on  avail 
ability;  availability  parameters  in  the  model;  and  system 
optimization . 
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AVAILABILITY  OF  MAINTAINED  SYSTEMS 


CHAPTER  I 

INTRODUCTION 

Increasing  complexity  of  modern-day  equipment,  both 
in  the  military  and  commercial  areas,  has  brought  with  it 
new  engineering  problems  involving  high  performance,  reli¬ 
ability  and  maintainability.  Reliability  has  long  been 
considered  as  a  measure  of  system  effectiveness.  However, 
it  has  proved  to  be  an  incomplete  measure  of  effectiveness 
because  it  does  not  consider  maintainability,  another  mea¬ 
sure  of  system  performance.  With  increasing  complexity 
and  the  resulting  high  operational  and  maintenance  costs, 
greater  emphasis  has  been  placed  on  reducing  system  main¬ 
tenance  while  improving  reliability.  In  this  regard, 
availability,  which  is  a  combined  measure  of  reliability 
and  maintainability,  has  received  wide  usage  as  a  measure 
of  maintained  systems  effectiveness. 

This  thesis  is  a  survey  and  a  systematic  classifi¬ 
cation  of  the  literature  relevant  to  availability.  Empha¬ 
sis  in  this  thesis  is  centered  on  a  variety  of  topics 
related  to  availability.  In  Chapter  II,  basic  concepts 
include  definition  and  concepts  of  availability,  failure 
and  repair  times  distributions,  and  system  configuration. 
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In  Chapter  III,  the  different  approaches  used  in  obtain¬ 
ing  availability  models  are  discussed.  In  Chap  er  IV, 
many  availability  models  using  the  Markovian  approach  are 
discussed.  In  Chapter  V,  the  effect  of  preventive  main¬ 
tenance  policies  on  availability  is  explained  and  classi¬ 
fication  of  the  availability  parameters  used  in  the  model 
and  system  optimization  is  presented. 
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CHAPTER  II 


SURVEY  ON  BASIC  ELEMENTS  OF  AVAILABILITY 

In  describing  the  availability  of  a  given  system 
it  is  necessary  to  specify  three  things: 

1.  The  component  failure  process, 

2.  The  repair  or  maintenance  process,  and 

3.  System  configuration. 

In  this  chapter,  these  three  characteristics  will  be 
studied;  but  before  exploring  these  characteristics,  we 
would  like  to  discuss  the  various  definitions  of  avail¬ 
ability. 


Def inition  and  Concepts  of  Ava ilab il ity 
There  are  two  classifications  for  availability. 

Classif ication  1 

In  this  classification  the  definition  depends  on 
the  time  interval;  availability  is  classified  into  three 
categories  (Figure  2.1):  (1)  instantaneous  availability, 

(2)  average  uptime,  and  (3)  steady-state  availability  (135). 

1.  Instantaneous  availability,  ( A ( t ) ] ,  is  defined 
as  the  probability  that  the  system  is  operational  at.  any 
random  time,  t. 

2.  Average  uptime  ava  l  lab  il  ity,  [A(T)  ]  ,  is  the  proportion 
of  time  in  a  specified  interval  (0,  T)  that  the  system  is 
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available  for  use  and  is  expressed  as: 

_T 

A  ( T)  =  ~  I  A  ( t.)  dt  (2.1) 

'0 

3.  Steady  stage  availability,  A(«>),  is  the  uptime 
availability  when  T  -+  <*  and  is  given  by: 

A  (<*•’ )  =  lim  A  (T)  (2.2) 

The  representation  of  availability  which  is  appropriate 
depends  upon  the  system  mission  and  its  conditions  of  use. 
The  steady-state  availability  may  be  the  satisfactory  mea¬ 
sure  for  systems  which  are  to  be  operated  continuously. 

The  average  uptime  may  be  the  most  satisfactory  measure 
for  systems  whose  usage  is  defined  by  a  duty  cycle.  For 
systems  which  are  required  to  perform  a  function  at  any 
random  time,  the  instantaneous  availability  may  be  the 
most  satisfactory  measure. 

Classification  2 

In  this  classification  the  definition  depends  on 
the  type  of  downtime.  Availability  is  classified  also 
into  three  categories:  (1)  .nherent  availability, 

(2)  achieved  availability,  and  (3)  operational  availabil¬ 
ity  (Figure  2.2). 
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where : 

MTBF  =  mean  time  between  failure,  and 

MTTR  =  mean  time  to  repair. 

2.  Achieved  availability,  A^ ,  is  defined  as  the 
probability  that  a  system,  when  used  under  stated  condi¬ 
tions  in  an  ideal  support  environment  (i.e.,  available 
tools,  spares,  manpower,  etc.},  will  operate  satisfactorily 
at  a  given  point  in  time.  It  excludes  logistic  time  and 
waiting  or  administrative  downtime.  It  includes  active 
preventive  and  corrective  maintenance  downtime.  It  can 
be  expressed  as: 


I 


MTBM 

a  MTBM  +  M 


(2.4) 
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where  : 


MTBM  =  mean  time  between  maintenance,  and 

M  =  Mean  maintenance  time  resulting  from  both 
corrective  and  preventive  maintenance 
actions . 

3.  Operational  availability,  Aq,  is  defined  as 
the  probability  that  a  system,  when  used  under  stated  con¬ 
ditions  in  an  actual  operational  environment,  will  operate 
satisfactorily  at  a  given  point  in  time.  It  includes 
ready  time,  logistic  time,  and  waiting  or  administrative 
downtime.  It  can  be  expressed  as: 

_ _ MTBM  +  Ready  Time _  ,  ~  c 

o  ~  (MTBM  +  Ready  Time)  +  MDT  '  * 


where : 

Ready  time  =  the  time  in  which  the  system  is  ready 
but  not  in  operation, 

MDT  -  Maintenance  downtime  including  logis¬ 
tic  downtime  and  waiting  or  adminis¬ 
trative  time,  and 

MDT  =  M  +  delay  time. 

Operational  availability  appears  to  be  a  more 
realistic  measure  than  the  other  two  measures.  However, 
because  delay  time  is  determined  by  administrative  and 
supply  factors  which  depend  on  the  environment  of  the 
system,  thi.s  definition  will  not  be  used. 


The  Failure  Process  Distributions 


The  failure  times  distributions  describe  the  com¬ 
ponent  failure  process;  i.e.,  the  probability  law  govern¬ 
ing  failures.  There  are  two  ways  of  postulating  a  com¬ 
ponent  failure  distribution: 

1.  Physical  reasoning  theory.  In  this  method, 
we  depend  on  physical  reasoning  to  assume  a  form  of  the 
failure  distribution.  This  method  is  useful  when  there  is 
little  a  priori  information. 

2.  Using  observed  empirical  evidence.  In  this 
method,  attempts  can  be  made  to  fit  a  failure  density 
function  to  the  available  data. 

Of  course,  a  combination  of  these  two  methods  is  optimal 
if  sufficient  statistical  data  are  available  and  insight 
into  the  failure  distribution  can  be  obtained  by  physical 
theory . 

Many  types  of  failure  distributions  have  been 
used  in  the  literature.  Classification  of  references  on 
availability  according  to  various  types  of  failure  time 
distributions  (exponential,  Erland ,  Weibull,  Gamma, 
Rayleigh,  normal,  log-normal,  uniform,  extreme  value, 
and  general)  is  given  in  Table  2.1. 

The  most  frequently  employed  distribution  is  the 
negative  exponential  distribution.  To  justify  the  use 
of  the  exponential  failure  law,  much  experimental  and 
operational  data  have  been  collected.  One  of  the  earliest 
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TABLE  2.1 

CLASSIFICATION  OF  REFERENCES  ON  AVAILABILITY 
WITH  REGARD  TO  FAILURE  TIME  DISTRIBUTIONS 


Name  of 

Distribution 

References 

Exponential  1-4,  7-10,  14,  16,  18,  20-25,  28,  29, 

35,  39,  41-43.  47,  48,  50,  53-57,  59,  60, 
63,  65-70,  74-77,  83,  86-88,  90,  93,  94, 
96,  97,  103,  106,  109,  112,  113-122, 
126-128,  130,  137,  139,  140,  143-145, 

150,  152,  154-158,  164,  165,  167,  168- 
173,  175-179,  192,  193 


Erlang 

Weibull 


Gamma 


41,  91,  104,  151,  157,  165 


10,  16,  41,  88,  112,  113,  157,  165, 
179,  193,  196 


10,  16,  41,  88,  112,  113,  157,  165, 
179,  182 


Rayleigh  112,  116,  165 

Normal  10,  16,  21,  41,  56,  112,  113,  117, 

165,  179,  182 


Log-Normal  10,  14,  16,  40,  58,  113 

Uniform  27,  1.16,  16  5 


Extreme  Value  10,  113 


General  19,  20,  30,  47,  51,  66-68,  105,  110- 

(Arbitrary)  112,  126,  .'.31,  133-136,  142,  144,  162, 

166,  190 


reports  of  a  statistical  nature  was  made  by  Davis  [49] , 
and  subsequent  studies  by  Carhart  [37]  and  Boodman  [22] 
indicate  that  this  distribution  adequately  fits  failure 
experience.  Cox  and  Smith  [46]  demonstrate  that  the 
equipment  generally  will  exhibit  the  exponential  failure 
pattern  provided  that  the  components  are  replaced  as  they 
fail,  even  though  certain  components  within  the  equipment 
may  not  exhibit  it. 

This  distribution  seems  to  apply  to  all  electronic 
equipment.  The  rationale  behind  this  is  that  the  electronic 
components  do  not  fail  from  wearout  or  fatigue,  but  from 
being  over stressed ;  and  these  overstressed  conditions  are 
purely  randomly  distributed.  In  addition,  all  military 
standards  and  90  percent  of  the  military  reliability  calcu¬ 
lations  are  based  on  random  failures  [112] .  The  most 
attractive  feature  in  using  the  exponential  distribution 
is  that  it  enables  one  to  deal  with  a  constant  failure 
rate.  Hence,  it  provides  an  advantage  from  a  mathematical 
tractability  point  of  view  even  though  it  is  not  always 
justified. 

Bocchi  [21]  demonstrated  the  suitability  of  using 
the  exponential  failure  distribution  for  mechanical  reli¬ 
ability  prediction.  The  rationale  for  that  is  during  the 
useful  life  period  when  failures  are  due  to  poor  quality 
and  wearout  is  low,  failure  rates  should  tend  to  be  some¬ 
what  constant.  The  main  contributor  to  the  failure  rate 


is  when  random  high  stress  levels  exceed  the  strength  of 
the  components.  Other  components  which  also  justify  the 
use  of  exponential  failure  distributions  are  tube  puncture, 
capicitor  breakdown,  fuse  blowout,  many  aircraft  and 
missile  parts,  airborne  radars  and  fire  control  systems. 
References  that  justify  the  use  of  the  exponential  failure 
distribution  are  References  22,  37,  46,  49,  and  196. 

After  the  exponential  distribution,  the  Weibull 
distribution  is  probably  the  most  widely  used  distribution. 
The  hazard  function  of  the  Weibull  given  by 


h(t) 


=  £ 


e  <f> 


3-1 


t>  0 


(2.6) 


will  decrease  in  time  if  3  <  1,  will  increase  if  3  >  1,  or 
will  be  constant  if  3  =  1  which  is  the  exponential  case. 

The  Weibull  distribution  has  been  used  to  describe  fatigue 
failure,  vacuum  tube  failure,  and  ball  bearing  failure. 

It  is  the  most  popular  parametric  family  of  failure  dis- 
tri  rations. 

The  Raleigh  distribution  is  a  single  parameter 
density  which  holds  for  a  component  with  a  linearly 
increasing  failure  rate  (At) . 

The  rectangular  or  uniform  distribution  may  well 
be  employed  if  every  component  has  the  same  failure  rdte 
or  each  item  takes  equally  as  long  to  repair. 

The  Erlang  distribution  is  used  to  describe  both 
the  failure  and  repair  times.  Kodama  [104]  used  the 
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Erlang  as  a  failure  distribution.  Since  the  Erlang  dis¬ 
tributions  are  a  special  case  of  the  incomplete  gamma 
distributions  (shape  parameter  is  an  integer),  they  will 
fit  many  an  perhaps  most  of  the  distributions  encountered 
in  practice,  and  mathematical  treatment  will  be  easy. 

The  normal  distribution  describes  wearout  failures. 
By  wearout  failures  we  mean  those  cases  in  which  no  overt 
or  abrupt  failure  has  occurred  but  the  item  has  more  ox- 
less  gradually  reached  the  failed  state  through  the  deteri¬ 
oration  or  depletion  of  some  quantity,  structure,  or  func¬ 
tion  necessary  for  useful  operation.  In  this  type  of 
failure  it  is  noticed  that  the  component's  death  tends  to 
cluster  around  a  mean  life  time,  t;  half  the  failures 
occurring  before  and  half  afterward.  There  are  few  very 
early  or  very  late  failures,  the  failure  rate  being  low 
initially  and  reaching  a  maximum  at  the  mean  lifetime. 

The  hazard  is  very  low  initially,  and  rises  rapidly  after 
t.  This  familiar  pattern  of  failure  can  be  described  by 
the  normal  distribution  [37]  in  which  the  failure  rate  as 
a  function  of  operating  time,  t,  is  given  by: 

.  2 

f(t)  =  — — :  e  '  '  o  '  (2.7) 

a /Tit 

The  normal  distribution  failure  pattern  applies  to 
systems  which  exhibit  small  variation  in  failure  resistance 
among  the  individuals  within  a  population  and  which  are 
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subject  to  small  variations  in  environmental  severity. 
Further,  the  failure  resistance  of  the  mechanism  deteri¬ 
orates  with  time  and  operational  procedure  requires  that 
each  item  be  used  until  ultimate  failure.  Davis  [49] 
states  that  the  normal  distribution  characterizes  the 
failure  of  dry  cells  and  light  bulbs.  Bell  [16]  men¬ 
tioned  also  that  vacuum  tubes  used  in  commercial  and  mili¬ 
tary  electronic  equipment  follows  the  normal  failure  rate 
besides  significant  fraction  of  the  commercial  aircraft 
parts . 

Many  life  length  distributions  occurring  in  prac¬ 
tical  applications  are  obviously  not  normal  because  they 
are  markedly  skewed  whereas  the  normal  distribution  is 
symmetric.  The  gamma  family  of  distributions  is  skewed 
and  therefore  may  seem  more  natural  than  the  normal  family 
in  these  cases. 

The  gamma  density  function  is  described  by: 


f  (t) 


cx-1  -At 
A_  (X  t )  e  _ 

I’  (a ) 


X  ,  a  >  0  ,  t. 


(2.8) 


The  gamma  has  increasing  failure  rate  for  o  >  1  and,  in 
this  case,  the  failure  rate  is  bounded  above  by  A;  for 
a  <■  1 ,  the  failure  rate  is  decreasing. 

The  log-normal  density  is  defined  as: 


f  (t) 


11  2 

- __  exp  |  —  -  ( l og  t -  \i )  ]  ~®<  p < **• 

to  /2  rr  2rj  o>0 

t>0 


(2.9) 
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This  is  a  skew  distribution  in  which  both  long 
and  short  downtimes  occur  more  frequently  than  would  be 
the  case  in  data  with  the  same  value  fitted  to  an  exponen¬ 
tial  distribution.  The  failure  rate  of  the  log-normal 
distribution  increases  at  first  and  then  eventually 
decreases  to  zero.  For  this  reason,  the  log-normal  has 
found  disfavor  as  a  failure  distribution.  It  has  been 
proposed  as  a  reasonable  family  of  distributions  for 
describing  the  length  of  time  to  repair  a  piece  of  equip¬ 
ment,  however,  and  there  is  some  empirical  evidence  for 
this  assertion  [10]. 

Many  authors  including  Coppola  [45]  and  Howard 
[92],  indicate  that  downtimes  are  generally  well  fitted 
by  a  log-normal  distribution.  Shelley  [163]  pointed  out 
the  use  of  log-normal  for  cargo  aircraft  perfectly  fits 
the  data,  especially  at  the  upper  percentile  points. 

Recent  reliabilities  studies  on  various  potential  communi¬ 
cation  systems  indicates  that  many  semiconductor  devices 
have  lifetime  distributions  well  represented  by  the 
log-normal  [40]. 

On  the  basis  of  actual  observation  of  time  to 
failure  it  is  difficult  to  distinguish  among  the  various 
nonsymmetr i cal  probability  functions.  Thus,  the  differ¬ 
ences  among  the  gamma,  Weibuil ,  and  log -normal  distribution 
functions  become  significant  only  in  the  tails  of  the 
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distribution  but  actual  observations  are  sparse  in  the 
tails  because  of  limited  sample  sizes. 

The  Repair  Process  Distributions 
Table  2.2  shows  the  classification  of  references 
on  availability  with  regard  to  a  variety  of  repair  time 
distributions:  exponential.  Erlang,  Weibull,  Gamma, 
Rayleigh,  normal,  log-normal,  uniform,  and  general. 

The  exponential  distribution  is  used  as  a  theoreti¬ 
cal  distribution  for  the  repair  time  because  of  its  ana¬ 
lytical  properties  and  computational  purposes  [188]. 

Rohn  [154]  maintains  that  the  essential  characteristic 
of  repair  times  of  complex  electronic  equipment  is  stated 
as  a  high  frequency  of  short  repair  times  and  a  few  long 
repair  times;  thus,  this  type  of  behavior  suggests  repre¬ 
sentation  by  an  exponential  distribution. 

As  mentioned  before,  the  .log-normal  distribution 
is  quite  popular  for  the  distribution  of  repair  times. 

In  many  situations,  repair  times  are  best  described  by  the 
log-normal  distribution,  and  many  authors  [45,  92,  163, 

179,  187]  justify  the  use  of  the  distribution.  Studies  on 
airborne  radar  equipment  and  ground  equipment  for  surface- 
to-air  missile  systems  have  indicated  observed  repair  time 
distributions  that  best  fit  the  log-normal  distribution 
[ 77,  162] . 
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TABLE  2.2 


CLASSIFICATION  OF  REFERENCES  ON  AVAILABILITY 
WITH  REGARD  TO  REPAIR  TIME  DISTRIBUTIONS 


Name  of 
Distribution 

References 

Exponential 

1-4,  7,  10,  18,  23,  24,  25,  35,  39,  43, 
50,  53-56,  59,  63,  68-70,  74-75,  86-88, 
90,  93,  94,  103,  107,  112,  114,  116, 

118,  120,  122,  127,  137,  139,  140,  143, 
154,  156-158,  165,  172,  173,  175,  188, 
192,  193 

Erlang 

69,  122,  126,  144 

Weibull 

29,  112,  193 

Gamma 

24,  29,  116,  140,  144,  146,  157 

Raleigh 

112,  116 

Normal 

14,  20,  47,  56,  112 

Log-Normal 

10,  20,  29,  47,  56,  60,  83,  88,  102, 

179 

Uniform 

116,  122 

General 

(Arbitrary) 

10,  19,  28,  30,  42,  43,  48,  51,  65,  74, 
76,  96,  97,  104,  105,  106,  109,  110-112, 
119,  121,  126,  130,  131,  133-136,  142, 
144,  145,  150,  151,  162,  164,  166,  167, 
168,  171,  190 

17 


System  Configurations 

Classifications  of  references  on  system  configura¬ 
tion  are  shown  in  Table  2.3.  The  logical  approach  in  the 
availability  analysis  is  to  decompose  the  system,  under- 
consideration  into  functional  entities  composed  of  com¬ 
ponents  or  subsystems.  This  subdivision  generates  a  block- 
diagram  and  describes  the  system  operation.  To  fit  this 
logical  structure,  models  are  formulated.  In  this  way, 
the  block-diagram  of  the  type  of  the  system  configurations 
describes  how  the  components  are  functionally  connected 
and  the  rules  of  operation. 

The  simplest  structure  in  availability  analysis 
is  the  single  configuration  in  which  only  one  component 
comprises  a  system. 

The  series  configuration  is  the  next  simplest  and 
most  common  structure.  In  this  configuration  the  func¬ 
tional  operation  of  the  system  depends  on  the  operation 
of  all  system  components.  The  redundant  configuration  can 
be  divided  into  two  main  categories — the  parallel  redun¬ 
dant  configuration  and  the  standby  redundant  configuration. 
In  the  parallel  redundant  configuration  the  system  operates 
if  any  one  of  the  components  operate.  This  configuration 
is  often  called  the  full  redundant  configuration.  On  the 
other  hand,  if  the  system  operation  requires  more  than  one- 
component  to  operate,  this  configuration  is  called  the  par¬ 
tial  redundant  configuration.  In  the  parallel  system  all 
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TABLE  2.3 


CLASSIFICATION  OF  REFERENCES  ON  AVAILABILITY 
WITH  REGARD  TO  SYSTEM  CONFIGURATIONS 


System 

Configuration 


References 


Single 

6,  7,  10,  14,  25,  28,  35,  39,  53,  75, 
105,  114,  116,  156,  157,  165,  179, 

182,  193 

Series 

10,  14,  23,  53,  78,  90,  96,  119,  126, 
130,  142,  143,  160,  164,  165,  173,  174, 
179,  190 

Redundant 

Parallel 

Redundant 

2-5,  7,  10,  14,  24,  35,  39,  54,  59,  63, 
68,  74,  75,  77,  85,  87,  88,  90,  94,  103, 
104,  111,  118.  120-122,  139,  140,  143, 
152-155,  157,  158,  165,  173,  179,  192, 
193 

Standby 

Redundant 

4,  10,  13,  14,  19,  30,  39,  42,  43,  48, 
55,  59,  65,  68,  77,  78,  88,  93,  104, 

109,  121,  128,  132-137,  140,  142,  144, 
152,  157,  165-167,  168,  171,  179 

Perfect 

Switch 

13,  19,  30,  39,  42,  43,  65,  78,  104, 

109,  136-139,  166,  171 

Imperfect 

Switch 

48,  96,  137,  140,  142 

Cold 

Standby 

13,  30,  39,  59,  65,  76,  78,  79,  125, 

.128,  134,  136,  137,  160,  166 

Warm 


Standby 

19, 

4  2, 

43, 

104  , 

144,  167,  170,  171 

Series 

10, 

39, 

54  , 

6  5 , 

75,  106,  110,  112,  143, 

Parallel  165,  175,  179 

Complex  60,  90 
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the  components  are  turned  on  at  the  beginning  and  operate 
until  failure  occurs.  Using  less  reliable  units  in  redun¬ 
dant  configurations  is  one  of  the  methods  of  coping  with 
the  problem  of  designing  reliable  systems.  For  nonmain- 
tamed  systems,  redundancy  is  best  applied  at  the  component 
level  rather  than  at  the  system  level.  However,  for  systems 
whose  components  can  be  repaired  as  they  fall,  to  have 
redundancy  at  the  component  level  may  not  be  the  best 
policy.  The  reason  is  that  if  component  redundancy  is 
employed,  repair  may  not  be  possible  while  the  system  is 
operating;  whereas,  a  failure  with  system  redundancy  could 
be  repaired. 

In  the  standby  redundant  system  the  parallel  com¬ 
ponents  are  not  active  at  the  same  time.  At  the  start  of 
operation  the  switch  connects  the  input  to  one  component. 
Meanwhile,  other  components  are  left  in  standby  with  zero 
failure  rate  or  a  failure  rate  lower  than  the  active  com¬ 
ponents.  The  system  in  which  standby  components  cannot 
fail  is  then  referred  to  as  cold  standby.  The  system  is 
called  warm  standby  if  only  one  component  operates  at  a 
time,  and  the  standby  component  has  a  lower  failure  rate 
than  the  active  component,  but  not  zero  failure  rate  as  in 
cold  standby  ,. 

The  standby  conf iguration  can  be  divided  according 
to  the  type  of  switching  to  two  types:  (I)  perfect  switch¬ 
ing.  and  (2)  imperfect  switching.  If  the  switching  device 
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is  assumed  to  be  perfect,  the  standby  system  is  better 
than  the  parallel  system.  The  situation  changes  when  the 
standby  component  ages  and  the  switch  is  imperfect. 

Figure  2.3  represents  the  different  types  of  system  con¬ 
figurations  . 

Based  on  the  configurations  discussed  above,  the 
system  configuration  concept  is  further  extended  to  include 
series  parallel,  parallel  series,  and  complex.  By  complex 
configuration  we  mean  a  system  which  .is  not  purely  series, 
parallel,  series  parallel  or  parallel  series. 
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Standby  Parallel 

Redundant  Redundant 


i  j 

Perfect  Imperfect 

Switching  Switching 


Cold  Warm 

Standby  Standby 


Fig.  2,3.  Different  Types  of  System  Configurations 
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CHAPTER  III 

APPROACHES  USED  IN  OBTAINING 
AVAILABILITY  MODELS 

Markovian 

The  Markovian  approach  in  the  formulation  of  the 
availability  model  has  been  frequently  used  assuming 
exponential  distributions  for  failure  times  and  repair 
times  (see  Table  3.1  for  references) .  To  obtain  the  avail¬ 
ability  model  of  a  given  system  using  this  approach, 

Sandler  [157]  suggests  that  the  following  to  be  specified: 
(1)  the  component  failure  process,  (2)  the  system  configura¬ 
tion,  (3)  the  repair  policy,  and  (4)  the  state  in  which  the 
system  is  defined  to  be  failed  (see  Chapter  IV  for 
details)  . 

For  an  illustration,  let  us  consider  a  single 
component  system  with  a  constant  failure  rate,  A,  and  a 
constant  repair  rate,  y  (exponential  distribution) .  Since 
repair  is  possible,  transitions  can  be  made  back  and  forth. 
Thus,  two  states  can  be  designated:  (1)  State  0--the  system 
is  operating,  and  (2)  State  1 --the  system  has  failed  and 
is  under  repair. 

Using  conditional  probabilities,  the  transition 
matrix  can  be  constructed  and  the  differei  ial  equations 


TABLE  3.1 

APPROACHES  USED  IN  OBTAINING  AVAILABILITY  MODEL 


Classification 


References 


Markovian 

Instantaneous 
Availabi.1  ity 

10,  19,  25,  39,  63,  69,  72,  93,  107, 
127,  132,  134-137,  153,  157,  165,  166, 
178,  192 

Average  Uptime 
Availability 

10,  39,  63,  69,  157 

Steady-State 

Availability 

2,  3,  5,  10,  24,  25,  39,  42,  50,  53-56 
59,  63,  69-74,  78,  79,  87,  90,  93,  94, 
103,  109,  111,  114,  120,  134-137,  139, 
140,  156,  157,  160,  165,  167,  171,  175 

Ratio  of  Uptime 
to  Total  Time 

1,  4,  14,  20,  23,  35,  47,  51,  60,  65, 
68,  75,  83,  89,  92,  96,  100,  110-112, 
116,  119,  120,  130,  131,  143,  158,  162 
172-174,  188,  190,  193 

MTBF 

MTBF+MTTR 

4,  20,  23,  51,  65,  75,  89,  92,  96,  110 
119,  120,  126,  143,  158,  172,  173,  190 
193 

MTBM 

MTBM+M 

20,  51,  112 

Uptime 

Up timet Down time 

20,  51,  60,  111,  116,  131,  164,  174 

Integral  Theory 

68 

Monte  Carlo 
Simulation 

60,  123 

Single-Cycle 

Availability 

116,  131 

Multiple-Cycle 

Availability 

96 

Confidence  Interval 
of  Availability 

25,  29,  131,  172-174 

Bayesian  Approach  24,  25,  73,  173,  174 


describing  the  stochastic  behavior  of  the  system  can  be 
formed . 


dP0 (t) 

— ~~ —  =  -XP0(t)  +  y  P 1  ( t ) 


(3.1) 


dP. (t) 


(3.2) 


where : 

p.(t)  denotes  the  probability  of  the  system  being 
in  state  i  at  time  t. 

If  the  system  is  in  operation  at  time  t  =  0,  the  initial 
conditions  are  PQ(0)  =  1  and  P^fO)  =0.  Transforming  equa¬ 
tions  (3.1)  and  (3.2)  into  Laplace  transforms  under  the 
above  initial  conditions,  we  have 

( s+X ) PQ ( s)  -  yP1(3)  =  1  (3.3) 

-  APq  ( s)  +  (s+yjp^s)  =  0.  (3.4) 

Now  the  instantaneous  availability,  A(t),  is  the  inverse 
Laplace  transform  of  Pg(s);  i.e.,  A(t) 

Solving 

A  ( t )  =  P  (t)  =  -rj-  +  ~r  e~(A  +  p)t  (3.5) 

the  average  uptime  for  some  definite  period  of  time  (0,  T) 
can  be  found  by  integrating  A(t)  over  this  time  interval 
and  dividing  by  the  total  time. 


A  ( T )  =  ~  A(t)dt  =  +  - *-?-  U“e"(X+y)T]  (3.6) 

TJ0  >+P  (A  +;j  )  T 

If  we  are  interested  in  the  long-range  availability,  we 
can  let  T-*°°  and  find  the  steady-state  availability 


A(co  ) 


(3.7) 


Due  to  analytical  cind  computational  difficulty, 
not  much  work  has  been  done  when  failure  and  repair  times 
are  other  than  exponential.  For  the  analysis  of  the 
redundant  system  with  exponential  failure  pdf  and  the 
general  repair  time  distribution,  Branson  and  Shah  employ 
a  semi-Markov  process.  Hall  and  others  [88]  analyze  the 
redundant  system  when  failure  times  and  repair  times  follow 
combinations  of  the  exponential,  Weibull,  and  log-normal 
distributions.  They  illustrate  the  use  of  Fourier  series 
for  evaluating  the  inverse  Laplace  transformation.  Although 
non-Markovian  processes  have  not  been  studied  as  widely 
as  Markovian  processes,  Sandler  [157]  shows  that  it  is 
often  possible  to  treat  a  stochastic  process  ol.  the  non- 
Markovian  type  by  reducing  it  to  a  Markov  process.  This 
can  be  done  by  increasing  the  number  of  states,  each  being 
described  by  a  constant  transition  rate.  As  an  example,  a 
single  component  system  with  an  Erlang  failure  distribution 
and  the  cdf 


F  ( t )  =-  1 
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(3.8) 


and  an  exponential  repair  distribution  with  the  cdf 

G  ( t )  =  1  -  e;jt  (3.9) 

by  assuming  that  the  component  goes  through  two  exponen¬ 
tial  phases  each  of  average  length  1/A,  the  process  can 
be  reduced  to  a  Markov  process  with  three  states: 

(1)  State  0--the  system  is  operating  in  the  first  phase, 

(2)  State  l--the  system  is  operating  in  the  second  phase, 
and  (3)  State  2--the  system  has  failed  and  is  under  repair. 

This  formulation  leads  to  the  transition  matrix:: 
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The  solution  of  this  matrix  is  simply 


(3.10) 


R ( t )  =  PQ(t)  +  Px(t)  =  e"Xt(l+At)  (3.11) 

Regulinski  [153]  used  the  Markovian  approach  to 
model  the  availability  function  for  computer  networks. 

Gates  [72]  presented  an  analytic  technique  for  evaluating 
the  availability  of  complex  systems  which  are  required  to 
operate  around  the  clock,  but  which  are  staffed  with  main¬ 
tenance  personnel  periodically  on  a  shift  basis.  He  shows 
that  such  systems  can  be  modeled  as  a  periodically,  time 
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varying  Markov  process  governed  by  a  repeatable  sequence 
of  transition  matrices. 

Doyon  [56]  utilizes  the  steady-state  availability 
concept  to  analyze  a  computer  system  consisting  of  a  data 
processor  and  tape  units.  The  purpose  of  the  analysis  is 
to  solve  for  the  MTTR  of  the  redundant  system.  The  author 
points  out  that  defining  the  system  states  and  formulating 
the  appropriate  system  steady-state  availability  transi¬ 
tion  rate  diagram  is  the  step  requiring  the  greatest  degree 
of  ingenuity  and  expertise.  By  contrast,  subsequent  steps 
to  obtain  a  numerical  solution  for  the  system  MTTR  involves 
only  routine  mathematical  manipulations. 

The  above  approach  is  called  the  differential 
theory  in  reliability  since  the  states  of  the  system  can 
be  expressed  in  the  form  of  a  set  of  differential  equations 
whose  solution  permits  the  evaluation  of  reliability  and 
availability  of  the  system.  When  failure  and/or  repair 
time  are  not  exponentially  distributed,  the  differential 
theory  is  not  applicable;  so  the  integral  theory  was  intro¬ 
duced  to  overcome  differential  theory  limitations. 

Integral  Theory  of  Reliability 

The  first  paper  on  integral  theory  was  published 
in  1973.  In  1974  integral  theory  was  used  to  evaluate 
the  reliability  of  complex  systems,  such  as  telephone 
exchanges,  whose  repair  time  was  not  exponentially 
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distributed  [Galetto,  68] .  In  1975  it  was  proved  that 
integral  and  differential  theories  are  equivalent  as 
Markovian  processes  are  studied.  In  the  same  year, 
integral  theory  was  applied  to  state  a  general  model  for 
system  cost-effectiveness,  as  failure  and  repair  rates 
are  assumed  constant.  In  1977  Galetto  used  the  differen¬ 
tial  theory  for  obtaining  the  reliability  and  availability 
of  different  system  configurations  and  drive  formulas  for 
MTTR  (mean  time  to  repair),  mean  uptime  (MUT)  and  mean 
downtime  (MDT)  as  a  function  in  MTTR  and  then  to  derive 
steady  state  availability,  A(co): 


A  (  °o)  - 


.MUT 


MUT  +  MDT 


(3.12) 


Galetto  shows  that  the  ratio  ----------- ---  is  a  meaningless 

definition  of  availability,  unless  series  systems  are 
considered . 

The1  integral  theory  of  reliability  overcomes  the 
limitation  of  the  differential  theory  especially  for  the 
mechanical  systems  since  the  failure  rate  for  such  sys¬ 
tems  is  increasing  as  they  age  during  operation. 


Ratio  of  Uptime  to  Total  Time 
Another  approach  in  the  formulation  of  the  avail¬ 
ability  model  is  the  use  of  the  definitions  inherent, 
achieved,  and  operational  availability.  When  only  correc¬ 
tive  maintenance  is  considered,  the  inherent  availability 
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which  is  a  function  of  MTBF  and  MTTR  is  employed.  In  this 
case,  MTBF  is  computed  by: 


MTBF  =|  R(t)dt  (3.13) 

0 


where : 


R ( t)  is  the  reliability  function  of  the  system. 

MTTR  is  interpreted  at  synonymous  with  mean  corrective 
maintenance  time.  When  both  corrective  and  preventive 
maintenance  are  considered,  the  achieved  availability  which 
is  a  function  of  MTBM  and  M  is  introduced  where  MTBM  is 
the  mean  interval  of  all  maintenance  requirements,  both 
corrective  and  preventive.  M  is  the  downtime  resulting 
from  both  corrective  and  preventive  maintenance.  For 
example,  when  preventive  maintenance  is  scheduled  at  time, 
T,  it  is  expressed  by 


MTBM  = 


T 

/' 


R  (  s )  d  s 


(3.14) 


M  is  expressed  as: 


M 


M  f  +  M  f 

c  c _ p  p 

“  f  +  f 

c  P 


(3.15) 


where  : 


M  is  the  downtime  resulting  from  both  corrective 
and  preventive  maintenance, 
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M  is  the  mean  corrective  maintenance  time, 
c 

M  is  the  mean  preventive  maintenance  time, 

P 

f  is  the  number  of  corrective  maintenance 

Q 

actions,  and 

f  is  the  number  of  preventive  maintenance 
P  actions. 

Operational  availability  is  an  appropriate  me., 
sure  if  downtime  includes  logistics  and  administrative  time 
as  well  as  active  maintenance  downtime.  For  the  classifi¬ 
cation  of  references,  see  Table  3.1. 

Monte  Carlo  Simulation 

Whenever  the  problem  is  extremely  complex  and/or 
experimentation  is  desirable  but  costly,  Myers  suggests 
the  use  of  the  Monte  Carlo  technique,  and  illustrates  a 
few  examples  of  this  solution  technique.  Faragher  and 
Watson  [60],  however,  maintain  that  availability  err;. lysis 
of  complex  systems  utilizing  Monte  Carlo  simulation  tech¬ 
nique  have  revealed  a  lack  of  realism  because  they  re 
inflexible  with  respect  to  configuration  changes,  thus 
making  them  unsuitable  for  optimization  studies  of  avail¬ 
ability  through  component  redundancy.  By  incorporating 
engineering  and  mathematical  analysis,  they  present  a 
realistic  methodology  which  involves  an  engineering  descrip¬ 
tion  of  the  system,  the  formulation  of  the  simulation 
model,  and  the  computer  and  engineering  analysis  of  the 
sy  stem . 
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Single-Cycle  Availabil ity 
The  definition  of  availability  given  by  the 
fraction  of  the  total  desired  operating  time  has  been 
quite  widely  used  as  a  main  design  criterion.  However, 
there  is  no  probabilistic  guarantee  that  a  specified  avail¬ 
ability  value  will  ever  be  reached  other  than  approxi¬ 
mately  in  practice.  Martz  [116],  therefore,  provides  a 
definition  of  single  cycle  availability  that  incorporates 
a  probabilistic  guarantee  that  the  availability  value  will 
be  reached  in  practice.  Single-cycle  availability  is 

defined  as  the  value,  A  ,  such  that.: 

v 

P  ( A  >  Av)  =  v  -  v  £.  1  (3.16) 

By  specifying  v  we  have  a  probabilistic  guarantee  on  the 
frequency  of  occurrence  of  the  corresponding  availability 
value . 

For  example,  if  we  require  a  system  availability 
Av  ~  0.99  and  v  is  chosen  to  be  0.90,  in  this  case,  we 
are  90  percent  certain  that  our  design  value  of  0.99  will 
be  met  in  practice.  To  illustrate  the  use  of  this  defini¬ 
tion,  Martz  [1.16]  presents  a  few  examples  with  exponential, 
uniform,  and  Rayleigh  distributions  for  failure  and  repair- 

times,  and  shows  that  the  median  cycle  availability  A- 

u .  u  D 

is  equivalent  to  the  steady-state  availability. 
oJakagawa  and  Goei  [131]  extend  the  def  inition  for  Martz 
for  a  i  iriite  interval.  Their  definition  differs  with 

3  2 
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Martz's  in  that  they  take  into  consideration  the  interval 
of  system  operation. 

Availability  for  Multiple  Cycles 
and  for  a  Finite  Time 

Kabak  [96]  discusses  two  types  of  availability: 

(1)  availability  for  a  given  number  of  cycles,  and  (2)  avail¬ 
ability  for  a  given  length  of  time.  His  concept  of  avail¬ 
ability  is  the  proportion  of  time  that  system  is  up  and 
is  denoted  by 


t 

t+'R 


where  : 

t  =  failure  time  which  has  a  distribution  f(t), 
and 

R  =  a  constant  repair  time. 

The  availability  for  one  cycle,  A(l),  is  defined 
in  terms  of  expected  value  of  that  is. 


For 

T  = 

The 


A  ( 1 ) 


f (t)dt 


0 

i  cycles,  the  total  elapsed  time  is  T  +  iR  where 

j=i 

Z  t .  ;  i.e.,  T  is  the  i-fold  convolution  of  t. 
3*1  3 

availability  for  i  cycles,  A(i) ,  is  the  expected 


(3.17) 


value 


T 

of  — ■ and  is  given  by: 

IM  lK 
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A  ( i) 


T 


_ T 

T+  iR 


g  (T) dt 


(3.18) 


if  t  has  exponential  distribution.  T  has  an  Erlang  distri¬ 
bution  with  i  degrees  of  freedom. 

The  finite  time  availability  is  determined  by  con¬ 
sidering  the  number  c£  times,  n,  that  the  system  has 
suffered  a  failure  in  the  interval  (0,  T)  where  T  is  given, 
and  by  combining  the  associated  probability  with  tne  pro¬ 
portion  of  available  time. 

In  the  limit  when  T  -+•  «>  the  finite  time  availability 
approaches  the  steady-state  availability. 


Confidence  Interval  of  Availab ility 
A  point  estimate  of  availability  has  usually  been 
the  only  statistic  calculated,  although  decisions  about  the 
true  availability  of  the  system  should  take  uncertainty 
into  account.  Uncertanities  in  the  value  of  MTBF  and  MTTR 
reflect  an  uncertainty  in  the  value  of  the  point  avail¬ 
ab  ility 


A  ( t )  - 


MTBF 


MTBF  +  MTTR 


Treating  these  uncertain  parameters  as  random  variables, 
the  distribution  of  the  point  availability  can  be  derived 
by  combining  the  distributions  of  the  failure  and  repair 
times.  Hence ,  constructing  estimates  and  confidence 
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statements  for  the  availability  which  are  consistent  with 
the  equivalent  statements  on  the  failure  time  and  repair 
time  parameters. 

Thompson  [172]  derives  techniques  for  placing  a 
lower  confidence  limit  on  system  availability  and  for 
deciding  if  the  true  system  availability  differs  signifi¬ 
cantly  from  a  specified  value  when  MTBF  and  MTTR  are  esti¬ 
mated  from  test  data.  Assuming  times  to  failure  and  times 
to  repair  are  stochastically  independent  random  variables 
that  follow  exponential  distributions  with  MTBF  =  0  and 
MTTR  =  $  respectively,  (1  -  a)  lower  confidence  limit 
(LCL)  for  A  is  obtained  by: 


,CL  = 


0  +  <fc  F .  (2n.2n) 

1-a 


(3.19) 


where : 

0  and  $  are  sample  estimates  of  0  and  <p  respec¬ 
tively,  and 

n  is  the  number  of  failure  or  repair  actions,. 
In  a  similar  manner,  a  two-sided  confidence  interval  is 
derived  and  given  by: 


0  +  d)  F 1  /0(2n,2n) 

1  -a  /  2 


(3.20) 


'JC  li 


OF,  .  (  2n,  2n) 
I  ~a/2 


6  F,  . (  2n  ,  2n  )  +  y 

L  " '  t  /  a. 


(3.21) 
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Butterworth  and  Nikolai sen  [ 29 J  are  also  concerned  with 
the  bounds  on  the  availaoility  function  for  the  exponen¬ 
tial  failure  distribution  and  for  the  general  repair  time 
distributions.  They  employ  the  gamma,  leg-normal,  and 
Weibull  distributions  as  repair  time  distributions.  A 
bound  on  the  error  is  also  given.  Some  numerical  examples 
are  given  to  illustrate  the  practicality  of  the  bounds 
pre  sen  ted . 


Bayesian  Approach 

The  Bayesian  approach  in  the  formulation  of  avail¬ 
ability  models  has  been  employed  in  several  references 
(See  Table  3.1).  Brender  [25]  carries  out  the  statis¬ 
tical  assessment  of  system  availability  within  a  Bayesian 
framework.  He  considers  an  availability  model  consists 
of  an  alternating  sequence  of  independent  exponentially 
distributed  operational  and  repair  intervals,  with  the 
failure  time  and  repair  time  parameters  described  by  dis¬ 
tinct  gamma  distributions.  This  model  is  further  extended 
in  Reference  24,  in  which  a  more  general  prior  distribution 
is  considered  for  the  parameters  consisting  of  a  linear 
combination  of  gamma  distributions.  Furthermore,  a  non¬ 
exponential  distribution  with  uncertain  scale  and  shape 
parameters  is  introduced.  Gaver  and  Mazumdar  [73]  provide 
an  analysis  for  a  particular  class  of  sampling  plans, 
with  the  ultimate  goal  of  estimating  the  long-run  system 
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availability.  They  combine  mixed  data  using  snap-shot 
data  along  with  subsystem  life  and  repair  data  for  a 
simple  subsystem. 

Thompson  and  Springer  [174]  extend  this  result  for 
a  snap-shot  data  to  systems  of  several  subsystems.  Here, 
snap-shot  data  merely  reveals  whether  the  system  is  up  or 
down  at  the  instant  when  the  observation  is  made  and 
applies  only  where  the  state  of  each  subsystem  is  recorded 
on  successive  observations.  A  generalization  of  Reference 
73  to  systems  of  N  subsystems  can  be  seen  in  Reference  173, 
where  data  consists  of  samples  of  subsystem  life  and  repair 
times . 

Brender  [25]  develops  a  Bayes  transformation  which 
utilizes  the  failure  and  repair  data  to  readily  convert 
prior  estimates  and  confidence  statements  on  the  avail¬ 
ability  into  posterior  distributions.  Thompson  and 
Springer  [174]  also  carry  out  a  Bayes  analysis  of  system 
availability  for  an  N  component  series  system.  They  deter¬ 
mine  the  posterior  pdf  of  the  availability  through  the 
derivation  of  the  pdf  of  the  product  of  N  independent  random 
variables  using  the  Mellin  integral  transform.  Confidence 
limits  on  the  system  availability  are  then  obtained  from 
the  knowledge  of  the  posterior  pdf  of  the  availability. 

A  numerical  procedure  for  computing  Bayes  confi¬ 
dence  intervals  for  the  availability  can  be  seen  in 
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Reference  173.  Here,  both  the  series  and  parallel  systems 
are  considered. 


A  list  of  references  on  this  topic  is  in  Table 


CHAPTER  IV 


SOME  AVAILABILITY  MODELS  USING  THE 
MARKOVIAN  APPROACH 


Single-Equipment  Systems 

In  this  case  we  have  only  one  unit  which  can  have 
one  of  two  states:  (1)  State  0--the  system  is  operating, 
and  (2)  State  1 — the  system  has  failed  and  is  under  repair 
Assuming  that  the  failure  rate  is  constant  X;  i.e.,  the 
failure  distribution  is  exponential  and  also  the  repair 
distribution  is  exponential  with  mean  y.  Now  since  the 
conditional  probability  of  failure  in  t,  t+dt  is  Xdt  and 
the  conditional  probability  of  completing  a  repair  in 
t,  t+dt  is  ydt,  we  have  the  following  transition  matrix: 


The  system  is  depicted  in  Figure  4.1. 

The  differential  equations  describing  the  stochas¬ 
tic  behavior  of  this  system  can  be  formed  by  considering 
the  following: 

The  probability  that  the  system  is  in  State  0  at 
time  t+dt  is  derived  from  the  probability  that  it  was  in 
State  0  at  time  t  and  did  not  fail  in  t,t+dt,  or  that  it 
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was  in  State  1  at  time  t  and  returned  to  State  0  in  t,t+dt, 
thus  we  have : 

PQ(t+dt)  =  PQ(t)(l-Adt)  +  P^  ( t)  pdt+G  (dt)  (4.2) 

Similarly,  the  probability  of  being  in  State  1  at  time 
ttdt  is  derived  from  the  probability  that  the  system  was 
in  State  0  at  time  t  and  failed  in  t,ttdt;  or  it  was  in 
State  1  at  time  t,  and  the  repair  was  not  completed  in 
t,t+dt.  Therefore, 

P1(t+dt)  =  PQ(t)Xdt  +  PL(t)  (1-udt)  +  0  ( dt)  (4.3) 

The  term  0(dt)  in  both  equations  represents  the 
probability  of  two  events  taking  place  in  t,t+dt,  which 
is  negligible  so  we  can  write  the  differential  equations 
in  the  form: 


PQ ' (t)  =  - XPQ  ( t )  +  uP-l  (t) 
PL'  (t)  =  XPQ(t)  -  yP  1  ( t. ) 


(4.4) 


where : 

P.(t)  is  the  probability  of  being  in  State  i  at 
1  time  t,  and 

P.'(t)  is  the  first-order  derivative  with  resoect 
1  to  t. 

Shooman  [L6  5]  has  described  a  simple  algorithm  for  writing 
the  above  equations  and  it  is  to  equate  the  derivative  of 
the  probability  at  any  node  to  uhe  sum  of  the  transitions 
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coming  into  the  node.  Any  unity  gain  factor  of  the  self 
loops  must  first  be  set  to  zero  and  the  dt  factors  are 
dropped  from  the  branch  gains. 

Let  the  system  be  in  State  0  (in  operation)  at 
time  t,  then  the  initial  conditions  are:  Pg(Q)  =  1/ 

P]L  (0)  =0.  Transforming  Equations  (4.4)  inv.o  Laplace 
transforms  under  the  initial  conditions  we  have. 


sPQ(s)  -  1  +  A  PQ  ( s )  -  viPJ_(s)  =  0 
sP1(s)  -  APg(s)  +  tiP1(s)  =  0 

and  simplifying 


(  s+A  )  PQ  (  s)  -  lJP1(s)  -  1 
-APq(s)  +  (s+y)P1(s)  =0 


Using  Cramer's  rule, 


and 


1 1 

-y 

jo 

S+u 

s+ A 

-y 

-A 

s+y 

S+u 

P0(s)  '  s ( s+A+u) 


(4.5) 


(4.5) 


(4.7) 


Now  the  availability 
transform  of  Pg(s): 


function  A(t)  will  be  the 


(t) 


H  ,  A  -  (  A+U)  t 
A  f  u  a  +  u 


inverse 


(4.8) 
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In  many  cases  we  are  interested  in  the  average 
uptime  for  some  definite  period  of  time.  This  can  be 
found  simply  by  summing  A(t)  over  the  time  interval  of 
interest  and  dividing  by  the  total  time. 

T 

A(T)  =  A(t)dt 
0 


In  this  instance,  we  have: 


A(T) 


_Ji_  + _ X _ A _  -  ( A  +  p  )  T 

A+1J  ( A+p )  2T  ( A  +  p )  2 


(4.9) 


If  we  are  interested  in  the  long-term  availability  of  the 
system  we  can  let  T a>  and  find 


A(oo)  =  r-L  (4.10) 

A  +  p 

Systems  Sub  je o t  to  Two  Types  of  Repair 
Consider  the  problem  where  an  equipment  is  sub¬ 
ject  to  two  types  of  repair.  When  the  equipment  fails  for 
the  first  time  a  partial  repair  is  performed  which  restores 
the  system  to  operation;  however,  it  increases  the  proba¬ 
bility  of  failure.  After  it  fails  the  second  time,  a 
complete  repair  is  performed  which  restores  the  equipment 
to  a  "good-as-new"  condition.  Let  A ^  be  the  failure  rate 
when  the  equipment  has  been  through  a  complete  repair, 
and  A  2  '•'hen  it  has  been  through  a  partial  repair  (A2  •  A  ^ )  . 

4  3 


Similarly,  let  y^  be  the  repair  rate  for  a  partial  repair, 
and  y_  be  the  repair  rate  for  a  complete  repair  (y„  <  y.  )  . 

To  formulate  the  problem  we  establish  four  states  in  which 
the  system  can  be  at  any  time:  (1)  State  0—  the  system  is 
operating  after  a  complete  repair  has  been  performed; 

(2)  State  1 — the  system  is  failed  and  partial  repair  is 
being  performed,  (3)  State  2 — the  system  is  operating  after 
the  completion  of  a  partial  repair,  and  (4)  State  3 — the 
system  is  failed  and  a  complete  repair  is  being  performed. 
Figure  4.2  depicts  the  system  states.  It  has  to  be  noticed 
that  State  0  and  State  2  constitute  acceptable  system  states. 

The  transition  matrix  is: 


0  1 


2  3 


The  resulting  system  of  differential  equations  is 


P0Mt)  = 

-W") 

+  ^2P3(t) 

P2’(t)  = 

xipo(t) 

-y-^  ( t) 

P2*(t)  - 

y]P1(t)  -  X2P2(t) 

P3’(t)  = 

A2P2(t)  "ll2P3  (t) 
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For  steady  state  behavior  it  can  easily  be  shown 

that  the  limit  of  P.(t)  always  exists;  i.e.,  P.  --  lim  P.(t). 

1  1  t-*-”  1 

This  means  that  the  steady  state  solutions  can  be  found  by 

setting  the  derivatives  P^'(t)  equal  to  zero.  Then  the 

system  cf  differential  equations  reduces  to  a  system  of 

algebraic  equations.  So  Equations  (4.12)  can  be  reduced 

to  the  following  system  of  algebraic  equations: 


0 

0 

0 

0 


ipo 


Aipo 


ulpl 


plpl 


+  U2P3 


X  2P2 


X2p2  p?P3 


(4.13) 


To  solve  these  equations  we  must  also  make  use  of  the  fact 
that  the  PVs  are  a  probability  distribution;  i.e., 
n 

L  P.  =  1.  So  adding  this  equation  to  the  above  svstem 
i=0  1 

of  algebraic  equations  and  solving,  we  can  find  the  steady- 
state  availability 


A<">  =  P0  +  P? 

A1A2m1  +  A  2P1U2+X1X  2li2+X2X)l  11 2 

It  car  be  seen  that  if  A  ^  and  =  U  2  of  Equation 

(4.14)  reduces  to  p/X+p,  which  is  the  same  value  in  the 
previous  model . 
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System  with  Series  Conflqurat .i ons 

Consider  the  simple  system  where  two  equipments 
are  connected  in  series  such  that  if  either  fails  the  .sys¬ 
tem  fails.  For  simplicity,  we  shall  assume  that  each 
equipment  fails  at  the  same  rate,  A,  and  can  be  repaired 
at  the  same  rate,  p .  Now  the  system  can  be  thought  of  as 
being  in  any  one  of  three  possible  states  at  some  time,  t: 

(1)  "fate  0--when  both  equipments  are  operating;  (2)  State  1~- 
wher  one  equipment  is  operating  and  the  second  is  under 
repair;  and  (3)  State  2 --when  both  equipments  are  under 
repa ir . 

\ 

Since  both  equipments  are  required,  the  system  is 
defined  as  down  when  it  reaches  State  1.  Thus,  A(t.)  =  P^(t), 
the  probability  that  the  system  is  in  State  0  at  time,  t. 

The-  availability  function  is  directly  influenced 
by  the  number  of  repairmen  available  to  service  the  failed 
equipments.  So  we  will  consider  first  the  case  when  there 
is  a  single  repairman,  and  then  when  there  are  two  repair¬ 
men  working  independently  or  working  together. 


One  Repairman  Case 

Whun  a  single  repairman  is  available  to  service  the 
two  equipments,  the  system  transition  matrix  P  is: 


0  12 


0 

P  =  1 

2 


(4.15) 
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Figure  4.3.,  The  resulting 
is : 

t'lVt) 

(X+y)P1(t)  +  Vp2(t)  <4*16> 
,\P]L(t)  -  pP2(t) 


As  mentioned  before,  this  system  of  differential 


(4.18) 


f 


s 

t 

i 


t 


i 


2A2 


o  ~  2  2 

^  y  +  2  Ay  +  2  A 


The  steady-state  availability,  A  ( °°)  ,  will  be 


A<«)  =  PQ  -  ~- 


1L _ 

2 


(4.19) 


p  -r  2  y  X  2  X 


Next  we  will  consider  the  case  of  two  equipments  in  series 
with  two  repairmen. 


Two  Equipments  in  Series 
With  Two  Repairmen 

First,  we  will  consider  the  case  where  each  repair¬ 
man  can  only  work  on  one  particular  equipment.  The  Markov 
graph  cf  this  system  is  depicted  in  Figure  4.4.  The 
transition  matrix  P  of  this  system  is: 


0 

P  -  1 

2 


0 


/l-2X  2  X 

|  y  1-  ( A+  p) 
\  0  2y 


(4.20) 


The  difference  between  Equations  (4.15)  and  (4.20)  is  in  the 
last  row.  This  occurs  because  if  we  are  in  State  2  at 
time,  t,  we  can  return  to  Ftate  1  if  either  of  the 
equipments  is  repaired. 

"Die  steady-state  equations  of  this  system  are: 
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I 


0  =  - 

2Ap0  + 

ypi 

0  = 

2APq 

(A  +  y)  P-^  + 

2yP2 

0  = 

Api  - 

2yP2 

1  = 

po  + 

P1  + 

P2 

(4.21) 


Solving,  we  find  that: 


A  (°°)  =  Pn  =  — 1 — y 

(A+y) 


(4.22) 


Joint  Servicing  of  Failed  Equipments 
In  the  previous  case  if  the  two  repairmen  do  not 
work  independently  of  each  other,  i.e.,  if  there  are  two 
equipment  series  systems  with  two  repairmen,  v/e  might  expect 
that  both  of  them  would  attempt  to  service  the  equipment 
that  failed.  The  only  time  they  would  work  independently 
is  when  both  equipments  have  failed.  Sandler  [157] 
assumed  that  if  two  repairmen  are  servicing  a  single  equip¬ 
ment,  the  repair  rate  is  1 . 5u .  Under  the  assumption  that 
if  both  repairmen  are  servicing  a  single  equipment  and  a 
second  one  fails,  the  second  repairman  immediately  returns 
to  service  his  own  equipment.  In  this  case,  the  transition 
matrix  will  be: 


0  1 

0  f L  -2A  21 

1  !  1.5m  1-  (1 . 5(j+A) 

2  \  0  2y 


(4.23) 
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The  steady  state  equations  of  this  system  are: 


0  = 

-  2Apq  + 

1  -  Sl^ 

0  = 

2APq  -  ( 

:i.5p+x)p  + 

2  JJP 

0  = 

XP  - 

2  yP 

1  = 

po  + 

P1  + 

P 

Solving,  we  find  that: 

A<«)  =  Pn  = - — - 5  (4.25) 

3y  +  4Ay  +  2A 


Availability  Models  of  Parallel 
Redundant  Configurations 

Consider  a  two -equipment  redundant  system  operating 
in  parallel  which  can  be  in  the  following  states: 

( 1 )  State  0--both  equipments  operating,  (2)  State  l--one 
equipment  operating  and  one  equipment  under  repair,  and 
(3)  State  2--both  equipments  under  repair. 

When  the  system  is  in  State  2  it  is  defined  as 
failed.  The  transition  diagram  is  depicted  in  Figure  4.5. 
The  transition  matrix  is  developed  in  the  same  manner 
as  before.  The  transition  matrix  P  is: 
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The  transition  matrix  leads  directly  to  the  system  of 
linear  homogenous  differential  equations  which  describe 
the  stochastic  behavior  of  this  system  and  are  as  follows: 

•  P0' (t)  =  -  2X PQ ( t )  +  yP1(t) 

P1'(t)  *  2AP0(t)  -  (X+y)P1(t)  +  2yP2(t)  (,..27) 

P2‘  (t)  =  XPl(t)  ~  2hp2<:t! 

Considering  the  initial  condition,  let  the  system  be  in 
State  0  at  time  0,  then 

PQ(0)  =  .1  ,  P1(0)  =  0,  p 2  ( 0 )  -  0 

Taking  Laplace  transforms  cf  Equations  (4.27), 

sP Q  ( s )  -  p0(0)  -  -  2XP0(s)  h  yP3(s) 

sP^s)  -  P1(0)  =  2AP0(s)  -  (A  +  y)P1(p)  2mP?(s) 

sP2(s)  -  P2(0)  =  AP1(s)  -  2,  2(s) 

(4.28) 

Using  the  initial  conditions,  we  obtain: 


( s  +  2A ) Pq ( s)  -  yP1(s)  =  1 

-2AP0(s)  +  (s+A+y)P1(s)  -  2yP0(s)  =  0 

AP^(s)  +  ( s+2y ) P2 ( s)  -  0 

(4.29) 


Solving,  using  Cramer's  rule,  we  obtain: 


55 


Thus, 


P2(o)  = 


s+  2  A 

~  M 

0 

-2A 

s+A+y 

0 

0 

-A 

0 

s+2A 

-u 

0 

-2A 

s+  A  +  y 

-2y 

0 

> 

“A 

s+2y 

2A ' 


P?  (s)  s(s+2A  +  2y) (s+A+y) 


(4.30) 


(4.31) 


Breaking  this  expression  into  partial  fractions  we  obtain: 


2  A ' 


±  + 


+ 


s ( S+2A  + 2y ) ( S+A  +  y )  s  s+2A+2y  s+A  +  y 

(let  a  =A+y) 


(4.32) 


As  +  3asA  + 


2a 2  A  +  Bs2  +  Bsa  +  Cs2  +  2asC 


s  ( s+2a) ( s+a) 


Equating  constant  terms  we  have 

A2 

A  =  - A - ^ 


( A  +  y ) 


2 


Equating  coefficients  of  s  and  s  we  obtain 


(4.33) 


B  = 


(A  +  y) 


(4.34) 


C  = 


2  A ' 


(  A  +  y  ) 


(4.35) 
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Hence , 

2  2  2 
D  ,  .  _ A_ .  1  _ A_ __i _  ~ 2A _  .  .1 

2  (X  +  ,»2  s  ,X+i>2  '  (3+2X  +  2U)  -  {X+V)2  (s+x+p) 

(4.36) 

Taking  inverse  Laplace  transforms. 


p2(t) 


2  2  t 

A  .  A  _  -  2  ( A  +  y )  r" 

2  +  2  e 
(A  +  u)  ( A+y ) £ 


2 A _  - (A+y ) t 

2  e 

( A  +  y ) 

(4.37) 


Since  P-,(t)  is  the  probability  of  being  in  the  failed 
state  at  time  t,  the  availability  at  time,  t,  is  given  by: 


A { t)  =  1  -  P?(t)  -  PQ(t)  +  P1(t) 


(4.38) 


A  ( t) 


y  2+2A  y  _  xV2(X4ti)t  +  2A2e~U  +  M)t 

(A+y)2  (A+y)2  (A+y)2 


(4.39) 


From  Equation  (4.39)  we  obtain  the  steady-state  expres¬ 
sion  : 

T  2 

A(°°)  =  limj  A(t)dt  =  ^ — +-— ^  (4.40) 

T+ooJq  (A  +  y) 

In  the  two-equipment  parallel  system  with  two 
repairmen,  we  might  expect  both  of  them  to  work  together 
if  one  unit  failed.  However,  they  would  work  independently 
if  both  units  are  failed.  Thus,  we  may  have  the  case  that 
if  a  single  repairman  services  a  failed  unit.,  the  repair 
rate  is  y,  but  if  two  repairmen  service  the  same  failed 
equipment  the  repair  rate  is  1 . 5y  [Sandler  157].  If  we 
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further  assume  that  when  both  repairmen  are  servicing  a 
single  unit  and  the  second  one  fails,  the  second  repairman 
immediately  returns  to  service  his  own  unit,  then  the 
transition  matrix  is  as  follows: 


0 

P  =  1 
2 


(4.41) 


In  this  case  it  is  assumed  that  failure  of  any  unit  was 
detected  the  instant  it  occurred.  Very  often  this  is  not 
the  case  and  the  repair  operation  starts  only  when  the 
entire  system  has  failed. 

Let  us  consider  the  model  in  which  only  one  unit 
is  repaired  if  the  system  of  two  units  is  parallel  fails 
due  to  failure  of  both  unics.  It  is  only  when  preventive 
maintenance  is  undertnV.en  that  the  system  is  restored  to 
the  state  where  both  units  are  operating.  There  is  only 
one  repairman.  The  Markov  graph  is  shown  in  Figure  4.6  and 
the  transition  matrix  is 


58 


Elite 


'V-  ■ 


-v 


The  differential  equations  are: 


P0*  (t)  -  -2>i>0(t) 


pp  <ti 

=  2APq 

(t,  -  APX 

ft)  V 

y  F  „  ( t ) 

(4.43) 

V,t! 

XP1 

( t)  - 

yP.(t) 

£ 

Taking 

Laplace  transforms 

and  using 

the 

initial  conii 

- 

tions 

PQ(0)  =  1,  pd 

(0)  -  0 

,  and  P2(G)  » 

1 ,  then : 

( s+2A  )  P 

0  1  s) 

=  ] 

-  2AP, 

ols»  + 

(  S  +  A  )  Jrx  (  S) 

-  yP2(s)  «  0 

(4 .44) 

- 

A  P^ ( s) 

+  ( s-f  y  )  P,,  ( s)  -  0 

4L 

and 

j  s+2> 

C 

] 

1 

-2' 

s+A 

0 

I 

P0(s)  -•  - 

0 _ 

_  A 

__o__ 

h 

(4.45) 

4 

s+  2  A 

0 

0 

1 

-2A 

s-t  A 

-p 

0 

•  A 

s+y 

or 


P^(s) 


2  A 


s  l  S+2  A)  (  S+A+p) 


A  +  y 


2A 


(y-A)  (  st  2A )  (m2.x2) 


(  S+  A  4  y  ) 

(4  .40) 


Taking  inverse  Laplace  transforms,  we  obtain: 


eo 


-2X  t 


(4.47) 


P2(t) 


X _ A_ 

X+p  p-X 


e“ (X  +  p) t 


and 


A(t) 


1 


_p  _X__  -2Xt.  2X 2  ~(X+p)t 

X+p  p-X  e  "  2.2  e 

y  ~A  (4.48) 


Now  if  in  the  system  with  two  units  in  parallel  and  two 
repairmen,  the  status  of  the  individual  units  is  not. 
monitored,  repair  will  not  begin  until  the  system  .is  in 
State  2  where  both  units  have  failed.  We  can  define  the 
four  states  with  reference  to  the  Markov  graph  shown  in 
Figure  4.7  as  follows:  (1)  State  0 — both  units  are  opera¬ 
ting;  (2)  State  1 — one  unit  is  operating,  one  failed  and 
has  not  been  detected;  (3)  State  2 — both  units  failed  id 
are  under  repair;  and  (4)  State  3~-one  unit  is  operating, 
one  has  failed  and  is  under  repair. 

The  transition  matrix  is: 


The  system  of  the  differential  equations  is: 
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Fig.  4.7.  Markov  Graph  for  a  System  with  Two  Identical 
Units  in  Parallel  and  Two  Repairmen,  wnen  only 
at  System  Failure,  Both  Units  are  Repaired 


o  1 

-  0 

(t)  -  2APrj(t) 

+  pp3(t> 

P.  1 

X 

(t)  «  2A PQ  ( t ) 

-  xpx (t) 

P  ’ 
2 

(t)  = 

AP1  (t) 

-  2yP2 

:t) 

X-  AP3  ( t) 

P  5 
3 

(t)  - 

?uP2 

(t) 

-  (y+A)P^(t) 

(4.50) 

Taking  the  inverse  Laplace  transforms  with  the  initial 
condition  P_(0)  =  1,  P.,  (0)  =  0,  P..I0)  =  0,  and  Po(0)  -  0, 

U  .1  Z  3 

we  have 


( s+2A  )  Pq  (  s)  -yP-^s)  =  1 

-2A  P0(s)  +  (s+A)P1(s)  -  0 

APx(s)  -  (s+2y)P2(s)  -  >p3(£)  = 

2yP„(s)  +  ( sJ  tA)P_(s)  -  0 

3 

(4.51) 

and 


where  the  numerator 
the  denominator 


2A^(s+y+A)  and 
s ( s+3A ) ( 3~fs( 3y+A ) +2y . 
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1  2 

The  solution  for  the  roots  of  s  +  s(3y+A)  -f  2y  yields 


C  O 


f  3 p  -t  \  >  t  ( 3y-f  A  )  2  -8  u  2 


(4 . 53) 


Hence, 


P  (s)  = _ 2_X — (s+jj+AJ_ _ 

1 2  s  s(s+3a)  (s--r,  )  (s-r_) 

X  4 


(4 .54) 


Breaking  thin  expression  into  partial  fractions, 


p  . ... )  ,  _s_  +  _c_  _„0_ 

o  i  •  « A  3l  ^  ^  ^ 


(4  . 55) 


The  values  of  4,  b,  0,  and  L  can  be  found  by  supression, 
Taking  the  inverse  Laplace  transforms,  we  obtain. 


-3  At  ^ i r 

P,,  (t)  =  A  +  P  e  t  C  e  J-  De 


(4.56) 


and  the  availability  is  given  by 


A  ( t )  =  1  -  P2(t) 


(4.57) 


Inspection  of  the  quadratic  equation  for  r  ,  :%  shows  that 
r,  arid  r_  are  always  negative  real  numbers  since  A  and  y 

Jb 

are  always  positive;  therefore,  all  the  time  hor. irons 
are  decaying  exponentially  and  the  instantaneous  availabil¬ 
ity,  A ( t ) ,  rapidly  converges  to  the  steady-state  value. 

Equation  (4.56)  is  complex  in  nature  due  to  rq  and 
r0  not  having  simple  forms  and,  consequently,  it  j  s=  ret 
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easy  to  obtain  the  steady-state  availability  from  Equation 
(4.57).  But  the  steady-state  availability  may  be  obtained 
by  studying  the  steady-state  behavior.  This  steady-state 
solution  can  be  found  by  setting  the  derivatives  P^'(t) 
equal  to  ?ero.  Then  the  system  of  differential  equations 
reduces  to  a  system  of  algebraic  equations.  The  additional 
fact  that  P^’s  are  a  probability  and  hence 

E  P ■  -  1  needs  to  be  used  where  n  is  the  number  of 

.1-0  1 


possible  states.  So  to  obtain  the  steady -state  availabil¬ 
ity  the  set  of  equations  is: 


0 

=  -  2XPo 

+ 

U»3 

0 

2APq 

APX 

0 

- 

i 

2yP  2  + 

AP3  (4 

0 

2yP2  - 

(A+y)P3 

1 

P0  + 

Pi  + 

p2  + 

P3 

P2 

using  the 

last 

four  equations. 

2  A 

-A 

0 

0 

0 

A 

0 

A 

0 

0 

0 

-  ( A+y) 

1 

1 

1 

1 

2A(A2+Ay) 

2  A 

-A 

0 

0 

6yA2+2A3+6Ay2 

0 

A 

-2y 

A 

0 

0 

2  y 

- (A+y) 

! 

1 

1 

1 

1 

T 

X 
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(4 . 59) 


,  2  , 

p  _  ^  y _ 

2  2  2 
*  X  +  3  u  X  4  3  vi 

The  steady-state  availability  is: 


A  (°° ) 


1  -  P. 


2  yX  +  3  y 
X  24 3  yX  +  3y  2 


(4.60) 


Many  complex  problems  can  similarly  be  solved  for  the 
steady-state  availability  without  too  much  difficulty. 

Availability  of  Standby 
Redundant  Configurations 

Standby  redundancy  assumes  that  the  off-line 
equipment (s)  either  cannot  fail  or  have  a  failure  rate 
less  than  on-line  equipments.  When  this  is  true,  we  would 
expect  a  system's  availability  to  be  greater  with  standby 
redundancy  than  with  parallel  redundancy.  Consider  a  two- 
equipment  standby  system  where  the  on-line  equipment  fails 
at  the  rate,  X,  and  the  off-line  equipment  cannot  fail 
until  it  is  switched  to  an  on-line  position.  Assuming 
perfect  switch  reliability,  the  transition  diagram  for 
this  system  is  depicted  in  Figure  4.8. 

The  transition  matrix  for  this  system  is: 


0 

P  =  1 
2 


y 

\  0 


l 

x 

(  X  +  y ) 

y 


2 


0  \ 


(4.61) 
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The  steady-state  equation  of  this  system  is: 


0  =  -xp0  +  -JP1 

0  -  APQ  +  1  -  (A+u)P;l  +  uP2 


0  =  + 


XP1  +  UP  2 


1  = 


P. 


(4.62) 


The  steady-state  availability  can  be  found  as: 


A(»)  =  PQ  +  Px  =  1  -  P2 


2 

U_ 


+  Ap  + 


2 

A 


A  (  00 ) 


(4  .63) 


CHAPTER  V 

CORRECTIVE  AND  PREVENTIVE  MAINTENANCE 
AND  OPTIMIZATION  TECHNIQUES 


Effect  of  Corrective  and 
Preventive  Maintenance 

At  one  time  or  another  all  recoverable  systems 
are  subject  to  some  form  of  maintenance.  In  general, 
there  are  two  categories  of  maintenance  actions.  The 
first  is  off-shedule  or  corrective,  maintenance  and  is  per¬ 
formed  whenever  there  is  an  inservice  failure  or  mal¬ 
function.  The  system  operation  is  restored  by  replacing, 
repairing  or  adjusting  the  component  or  components  which 
caused  the  interruption  of  service.  The  second  category 
is  the  scheduled  or  preventive  maintenance  and  is  performed 
at  regular  intervals  to  keep  the  system  in  a  condition 
consistent  with  its  built-in  levels  of  performance  reli¬ 
ability  and  safety.  According  to  Bazovsky  [14],  during 
preventive  maintenance,  servicing,  and  inspection,  minor 
and  major  overhauls  are  done  such  that 

1.  regular  care  is  provided  to  normally  operating 
subsystems  and  components  which  require  such 
attention  (lubrication,  refueling,  cleaning, 
ad justment ,  alignment,  etc.); 

2.  failed  redundant  components  are  checked,  replaced, 
or  repaired  if  the  system  contains  redundancy;  and 

1.  components  which  are  nearing  a  wearout  condition 
are  replaced  or  overhauled. 

( 
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Preventive  maintenance  is  usually  associated  with 
wearout.  failures.  Preventive  maintenance  policies  consist 
of  seme  action  depending  upon  eitner  the  operating  age 
of  certain  components  in  the  system,  the  state  of  the 
system  degradation,  or  the  system  configuration.  In  the 
first  case,  a  preventive  maintenance  policy  is  usually 
some  program  for  the  planned  replacement  or  repair  of  cer¬ 
tain  critical  components  after  they  have  accumulated  a 
given  number  of  operating  hours.  In  the  second  case,  the 
preventive  maintenance  policies  are  designed  to  minimize 
the  time  the  system  will  spend  in  the  degraded  state. 

In  the  third  case,  the  preventive  maintenance  policies 
consist  of  periodic  inspection  and  repair  to  increase  the 
mean  life  of  the  system. 

Planned  replacements  or  maintenance  actions  are 
advantageous  for  systems  and  parts  whose  failure  rate 
increase  with  time,  or  are  less  costly  to  replace  or 
repair  when  operating  than  after  failure.  Under  preven¬ 
tive  maintenance  policies  it  may  be  possible  either  to 
increase  a  piece  of  equipment's  availability  or  relia¬ 
bility  oj  to  minimize  the  total  cost  of  replacement  and 
repairs.  Tj  us,  me  of  ;  he  most  important  maintenance 
problems  is  that  of  specifying  a  maintenance  policy  which 
balances  the  cost  of  failures  against  the  cost  of  preven¬ 
tive  maintenance  ac  ions  in  order  to  minimize  total 
maintenance  cost.  For  preventive  maintenance  to  be 
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worthwhile,  the  failure  rate  of  the  system  must  increase 
over  time  or  the  preventive  maintenance  of  the  system  must 
cost  less  than  the  corrective  maintenance.  Normally,  pre¬ 
ventive  maintenance  for  a  component  is  assumed  to  have  the 
same  effect  as  the  replacement  of  the  component.  In 
general,  four  different  types  of  preventive  maintenance  / 
are  possible  (see  Table  5.1). 


TABLE  5.1 

TYPE  OF  PREVENTIVE  MAINTENANCE 


Type  of  Preventive  Maintenance 

References 

Block  replacement  type 

.10,  1  7,  39,  175,  18  5,  19  5 

Age  replacement  type 

8,  11,  1.  38,  41,  52, 

112,  125,  133,  155,  181- 
184,  195 

Random  periodic  replace¬ 
ment  type 

10,  26,  64,  78,  182,  183 
19  5 

•Sequentially  determined 
replacement  type 

8,  10-12,  98,  195 

In  block  replacement,  all  components  of  a  given 
type  are  replaced  (or  repaired)  simultaneously  at  times 
independent  of  the  failure  history  of  the  system.  This 
policy  is  perhaps  more  realistic  than  others  since  it  does 
not  require  the  keeping  of  records  on  component  use,  but 
it  has  the  undesirable  characteristic  that  relatively  new 
components  are  replaced.  This  method  is  sometimes  called 
minimal  repair -replacement  type  because  for  failure  only  a 
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minimal  repair  is  done,  then  the  system  is  always  replaced 
at  age  T.  By  definition,  a  minimal  repair  does  not 
affect  the  hazard  rate  of  the  system  but  it  enables  the 
system  to  continue  its  work.  It  is  often  called  "bad  as 
old.  " 

In  age  replacement,  we  replace  a  component  exactly 
at  the  time  of  failure  or  at  T  hours  after  its  installa¬ 
tion  (previous  replacement  or  previous  preventive  main¬ 
tenance)  ,  whichever  occurs  first  (T  is  constant).  The 
random  period!  ■  policy  differs  only  in  that  T  is  a  random 
variable.  Gopalan  and  D' Souza  [78]  have  found  the  avail¬ 
ability  and  reliability  of  a  1-server  2-unit  system  sub¬ 
ject  to  preventive  maintenance  and  repair  under  the 
assumption  that  the  pdf's  of  the  times  to  failure  and  to 
preventive  maintenance  of  a  unit  are  arbitrary,  while  the 
repair  and  preventive  maintenance  rates  are  constant  but 
different.  Gopalan  and  Venka tachalam  [81]  extended  this 
work  to  a  n-unit  system  and  also  they  analyzed  a  n-unit 
system  in  which  each  unit  consists  of  two  components  con¬ 
nected  in  series.  The  sequentially  determined  replacement 
policy  is  one  in  which  the  replacement  interval  is  deter¬ 
mined  at  each  removal  (or  preventive  maintenance)  in  accord 
with  the  time  remaining  to  the  time  span. 

The  earliest  approach  to  the  planned  replacement 
problem  was  done  by  Campbell  [36]  and  Welker  [183] .  It 
is  concerned  with  mass  replacement,  and  develops  a  method 
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for  determining  optimum  replacement  intervals  for  certain 
vacuum  tubes.  Optimum  block  replacement  policies  for  an 
infinite  time  span  is  also  studied  by  Savage  [161]  .  A 
theory  of  optimum  sequential  replacement  policies  for  the 
case  of  a  finite  time  horizon  has  been  developed  by  Barlow 
and  Proschan  [12],  They  show  that  for  a  finite  time 
horizon  there  exists  policies  which  require  that  after 
each  removal  the  next,  planned  replacement  interval  is 
selected  to  minimize  the  expected  expenditure  during  the 
remaining  time,  c  nd  that  these  policies  will  be  more  effec¬ 
tive  than  a  fixed  replacement  policy.  However,  periodic 
or  preventive  maintenance  policies  assuming  an  infinite 
usage  horizon  seem  to  have  received  the  most  attention  in 
the  literature. 

E.rlier  work  on  restricted  forms  of  preventive 
maintenance  problems  is  found  in  Reference  181.  In  a 
series  of  reports,  Weiss  [18.1-183]  considers  the  effect 
on  system  reliability  and  on  the  maintenance  costs  of  both 
strictly  periodic  and  random  periodic  maintenance  or 
replacement  policies  for  an  essentially  infinite  usage 
period.  The  operating  characteristic  of  random  periodic 
policies  is  determined  by  Plehinger  [64].  Derman  and 
Sacks  [52]  obtain  the  optimal  replacement  policy  for  a 
piece  of  equipment  in  which  the  decision  to  replace 
depends  upon  the  observed  state  of  the  equipment  deteriora¬ 
tion  at  spec  i  f  i  ed  po  .i  n  t  s  in  t  i  me  ? .  Th  e  de  l  i  va  1  i  on  o  f  an 
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optimum  periodic  maintenance  interval  corresponding  to  a 
given  finite  span  is  basically  a  much  more  difficult 
problem.  Barlow  and  Proschan  [111  prove  the  existence 
of  such  an  optisnaJ  policy.  Further,  they  carefully  expose 
the  strictly  periodic  and  random  periodic  maintenance 
problems,  and  have  shown  that  for  an  infinite  time  hori¬ 
zon  there  always  exists  a  strictly  periodic  maintenance 
policy  which  is  superior  to  a  random  policy  [12], 

Meyers  and  Pick  *1201  have  studied  the  effects  of 
preventive  maintenance  on  availability  for  a  system  com¬ 
posed  of  similar  components  where  at  least  n  out  of  m 
components  must  operate  for  the  system  to  function. 
Nakagava  and  Osaki  [132]  have  dealt  with  optimal  preven¬ 
tive  maintenance  policies  to  maximize  the  availability  for 
e  2-unit  redundant  system. 

Ojd timal  Allocation  of  Availability  Parameters 
As  the  high  degree  of  complexity  is  involved  in 
many  of  the  modern-day  systems,  much  interest  has  been 
shown  in  allocating  the  availability  parameters  at  com¬ 
ponent  levels  in  the  early  stages  of  system  design.  The 
nr  :<  ical  problem  is  to  determine  those  parameters  from  a 
design,  redesign  or  operating  point  of  view  so  that  some 
measure  such  as  cost  or  weight  of  the  system  is  minimized 
while  a  system  availability  requirement  is  met.  Various 
combinations  of  availability  parameters  are  used  as  deci¬ 
sion  variables  in  the  allocation  problem  (see  Table  5.2). 
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TABLE  5,2 


AVAILABILITY  PARAMETERS 
(decision  variables  in  Inc  model) 


Availability  Parameters 

References 

MTBF  and  MTTR 

56, 

119,  145,  164.  190,  193 

Numbers  of  redundant 
components 

94  , 

138,  157 

MTBF,  MTTR,  and  nur.be r  of 
redundant  components 

75, 

110 

Failure  rate,  repair  rate, 
and  preventive  mainte¬ 
nance  period 

39, 

175,  176 

Failure  rate,  mean  correc¬ 
tive  maintenance  time, 
mean  preventive  mainte¬ 
nance  time,  and  age  for 
preventive  maintenance 

112 

The  optimization  techniques 

employed  for  the 

availability  allocation  problem 

are 

summarized  in  Table  5.3. 

TABLE 

5.3 

OPTIMIZATION  TECHNIQUE  EMPLOYED  FOR 

AVAILABILITY  ALLOCATION 

Optimization  Technique 

References 

Dynamic  Programming 

94  . 

110,  157,  164,  190 

Integer  programming 

160 

Geometric  programming 

56 

Lagrange  multipliers 

75, 

119,  164,  j  76 

SUMT 

38, 

39,  112,  175 
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The  tradeoff  technique  between  reliability  and 
maintainability  is  discussed  by  Goldman  and  VJhitin  [75]. 
They  employed  Lagrange  multipliers  and  show  how  the 
availability  parameters  consistent  with  the  minimum  cost 
operation  and  the  specified  system  availability  can  be 
calculated.  Kabak  [56]  has  used  geometric  programming 
to  determine  the  optimal  design  parameters  teat  minimize 
total  system  cost. 

Johnson  [94]  presents  a  methodology  for  optimizing 
the  cost  function  under  the  predetermined  availability 
level.  McNichols  and  Messer,  Jr.  [119]  have  employed  a 
cosc-based  procedure  for  allocating  the  availability 
parameters  at  components  .level.  The  allocation  problem 
is  expressed  as  the  minimization  of  the  total  improvement 
cost,  subject  to  the  constraint  of  meeting  the  system 
availability  goal,  and  is  solved  using  the  Lagrange  multi¬ 
pliers  method.  This  allocation  technique  is  applicable  to 
systems  which  can  be  described  as  a  series  model;  that  is, 
all  eciiponents  are  necessary  for  proper  system  function¬ 
ing.  Extension  to  other  models  has  not  been  considered 
although  it  appears  feasible  and  would  greatly  expand  the 
usefulness  and  application  areas  of  the  allocation  prob¬ 
lem. 

It  it  also  assumed  that  the  individual  components 
exhibit  constant  failure  rates  and  that  failures  occur 
independently.  The  removal  of  these  assumptions  would 
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generalize  the  allocation  procedure  and  certainly  make  it 
more  realistic.  However,  without  the  constant  failure 
rate  assumption,  analytic  solutions  are  usually  not 
feasible  and  often  impossible.  The  effects  of  various 
modes  c  ilure  could  be  investigated  by  careful  analysis 
and  predr  ion  of  possible  failure  patterns,  and  subse¬ 
quent  determination  of  the  effect  of  these  on  the  system 
availability. 

The  cost  equations  used  in  this  development 
describe  the  costs  associated  with  the  improvement  of 
component  failure  rates  and  repair  times  from  achieved 
levels.  Thus,  the  availability  requirement  is  attained  in 
the  manner  that  requires  the  least  cost  in  improvement  of 
design  and  equipment.  Although  this  problem  is  important 
to  design  and  development  groups,  the  allocations  should 
be  made  on  the  basis  of  minimizing  the  cost  of  the  system 
throughout  its  life.  In  this  respect,  the  cost  equations 
could  be  expanded  to  include  the  effects  of  component 
allocations  on  such  costs  of  system  ownership  as  sparing 
and  downtime.  The  ultimate  goal  would  be  to  allocate  to 
the  system  components  the  levels  of  reliability  and  main¬ 
tainability  that  would  minimize  the  overall  total  system 
lifetime  costs. 

Shershin  [164]  has  dealt  with  mathematical  means 
for  optimizing  the  simultaneous  apportionments  cf 
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reliability  and  maintainability  by  means  of  both  Lagrange 
multipliers  technique  and  dynamic  programming. 

Wilkinson  and  Palvekar  [190]  have  also  used 
dynamic  programming  for  optimally  allocating  availability 
to  a  multicomponent  system.  They  determine  the  IfTBF  and 
MTTR  that  minimize  the  system  cost  under  the  minimum 
availaoility  requirement.  As  an  extension  of  this  study, 
Lambert  et  ai.  [110J  present  a  method  for  determining  the 
optimum  MTBF,  MTTR,  and  the  number  of  redundant  compon¬ 
ents  for  a  multistage  system  to  achieve  a  given  availabil¬ 
ity  at  minimum  cost  by  dynamic  programming. 

Tillman  anc!  Chatter jee  [175]  have  studied  the 
problem  of  allocating  the  failure  rate,  repair  rate,  and 
preventive  maintenance  period  to  each  component  of  the 
system  consisting  of  n  subsystems  in  series  where  each 
subsystem  has  two  identical  components  in  parallel.  An 
extension  of  this  study  can  be  seen  in  Reference  112,  in 
which  availability  parameters  consist  of  failure  rate, 
mean  corrective  maintenance  time,  mean  preventive  main¬ 
tenance  time,  and  age  for  preventive  maintenance  of  each 
component.  Furthermore,  a  general  series-parallel  system 
configuration  is  considered.  In  both  studies,  the  sequen¬ 
tial  unconstrained  minimization  technique  ( 3UMT ) ,  which 
incorporates  the  Hooke  and  Jeeves  pattern  search  and 
heuristic  programming,  employed 


In  Reference  38  not  only  the  availability  is  con¬ 
sidered,  but  both  the  availability  and  mean  cycle-time 
are  considered  as  constraints  of  the  system.  The  objec¬ 
tive  is  to  maximize  the  system  cost  including  the  recur¬ 
ring  and  nonrecurring  cost.  In  this  study  only  the  age 
replacement  is  considered,  but  the  approach  can  be  readily 
applied  to  other  replacement  policies.  The  problem  is 
formulated  and  solved  as  a  nonlinear  programming  problem. 

Lie  [112]  studied  the  optimal  availability  alloca¬ 
tion  problem  for  a  series-parallel  system  consisting  of 
subsystems  in  series,  where  each  subsystem  has  identical 
units  in  parallel  having  various  probability  density 
functions  for  failure  and  repair  times  of  each  unit.  In 
developing  the  availability  models,  two  types  of  main¬ 
tenance  policies  for  each  subsystem  are  considered.  The 
corrective  maintenance  is  performed  when  the  subsystem 
fails  due  to  the  failure  of  all  redundant  units  and  the 
preventive  maintenance  is  scheduled  at  a  fixed  age  of  the 
subsystem  and  is  actually  performed  only  if  the  subsystem 
has  not  failed  before  this  fixed  age. 

Preventive  maintenance  action  consists  of  replac¬ 
ing  or  repairing  only  the  failed  units  if  each  unit  has  a 
constant  failure  rate  and  replacing  both  failed  and 
unfailed  units  if  each  unit  has  an  increasing  failure  rate 
with  time.  Thus,  each  subsystem  is  assumed  to  be  fully 
restored  after  the  completion  of  either  corrective  or 
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preventive  maintenance.  The  cost  of  the  system  consists 
of  three  components--the  cost  for  designing  the  mean  time 
between  maintenance  and  mean  corrective  and  preventive 
maintenance  time,  the  cost  for  corrective  maintenance,  and 
the  cost  for  preventive  maintenance.  The  optimal  avail¬ 
ability  allocation  problem,  is  then  to  determine  individual 
units  availability  specifications  which  will  minimize  the 
total  cost  of  the  system  under  the  constraint  of  meeting 
the  system  availability  requirement.  Both  the  cost  func¬ 
tion  and  the  availability  equation  of  the  system  are  non¬ 
linear;  the  optimization  methods  employed  to  solve  this 
problem  are  both  generalized  reduced  gradient  (GRG) 
method  and  sequential  unconstrained  minimization  technique 
(SUMT)  . 


Summary  and  Recommendations 

Summary 

This  thesis  presents  the  results  of  an  extensive 
literature  review  on  availability  of  maintained  systems, 
in  Chapter  II  the  different  concepts  and  definition  of 
availability  is  discussed;  then  a  survey  of  the  basic  ele¬ 
ments  of  availability  is  made  to  include  the  failure 
process,  repair  process  and  system  configuration.  The 
references  are  classified  according  to  the  last  three 
elements.  In  Chapter  111  the  different  approaches  used  in 
obtaining  availabilit  methods  are  discussed.  in 
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Chapter  IV  many  availability  models  using  the  Markovian 
approach  are  presented.  In  Chapter  V  the  effect  of  preven¬ 
tive  maintenance  policies  on  availability  is  explained  and 
classification  of  the  availability  parameters  used  in  the 
model  and  system  optimization  is  presented. 

Recommendations 

While  this  survey  covers  a  wide  variety  of  topics 
on  availability,  there  are  some  interesting  areas  for 
future  research.  One  of  the  major  areas  is  the  situation 
when  the  Markovian  conditions  are  not  met  or  not  approxi¬ 
mately  met  and  non-Markovian  models  must  be  used.  Devel¬ 
opment  in  this  area  would  permit  the  use  of  distributions 
other  than  the  exponential.  The  whole  area  of  non-perfect 
switching  needs  to  be  studied.  The  perfect  switching 
models  are  the  easiest  to  develop  bill,  in  practice,  non¬ 
perfect  switching  cases  are  encountered. 
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